Linking the GEO Data Sharing and Data Management Principles to other Reference Lifecycles and Principles
https://bit.ly/2022-09-05_GEO-Principles

Karl Benedict - University of New Mexico
Earth Science Information Partners

2022-09-05

GEO Working Group: Data Working Group (Data-WG)
Subgroup: Data Sharing and Data Management Principles (Data-WG/DSDMP). In particular, the following subgroup members provided invaluable input into the development of the approach used in the development of this analysis: Bente Lija Bye, Eugenio Trumpy, Chris Jarvis, Jose Miguel Rubio Iglesias, Ethan McMahon, Robert R Downs, Chris Shubert, Sebastian Claus, Paula De Salvo

Project Objective

Increase the Data-WG and broader GEO community understanding of the relationship between the GEO Data Sharing (pg 11) and Data Management Principles (pg. 10) (referred to as DSDMP hereafter) and other data lifecycle models and reference principles (referred to as reference frameworks hereafter) that have been developed since the development of the GEO principles as part of the 2016-2025 GEO Strategic Plan.

This work has also been developed as a complement to the development of a Revised GEO Data Sharing and Data Management Principles document that has been submitted by the GEO Secretariat to the Programme Board for decision (June 2022).

  • Identify gaps in the coverage by DSDMP concepts of elements of the reference frameworks
  • Inform discussions for further development of the DSDMP with specific insights gained from the process of gap identification
  • Enable enhanced communication of the DSDMP to audiences familiar with the reference frameworks through communication of the identified connections between the frameworks with which they are familiar and the DSDMP.

Lifecycles and Principles

GEO Data Sharing Principles

  • DSP 1: Open by default
    • DSP 1.1: Shared without charge
    • DSP 1.2: No restrictions on reuse
    • DSP 1.3: Conditions limited to registration & attribution
    • DSP 1.4: Default sharing standard through GEOSS Data-CORE
  • DSP 2: Possibility of sharing with restrictions
    • DSP 2.1: Recognition of sharing with restrictions as an execption to Principle 1
    • DSP 2.2: As few restrictions as possible
    • DSP 2.3: Imposed by “international instruments, national policies or legislation”
    • DSP 2.4: Limit charges to necessary cost recovery
  • DSP 3: Minimum of time delay
    • DSP 3.1: On a near-real-time basis whenever necessary or practicable

GEO Data Management Principles

  • DMP-1: Metadata for Discovery
  • DMP-2: Online Access
  • DMP-3: Data Encoding
  • DMP-4: Data Documentation
  • DMP-5: Data Traceability
  • DMP-6: Data Quality-Control
  • DMP-7: Data Preservation
  • DMP-8: Data and Metadata Verification
  • DMP-9: Data Review and Reprocessing
  • DMP-10: Persistent and Resolvable Identifiers

Reference Lifecycle Frameworks

DataONEDataONE Data Lifecycle
(DataONE - preliminary connections defined)

Reference Principles

FAIR and CARE PrinciplesFAIR Principles
(FAIR - preliminary connections defined)
CARE Principles
(CARE - not yet completed)

TRUST PrinciplesTRUST Principles
(TRUST - preliminary connections defined)

Preliminary Results - Sample

The following table summarizes the connections defined thus far between the GEO DSDMP and the reference frameworks.

DataManagementPrinciple PrincipleDescription Lifecycle-EDMF Lifecycle-NSTC Lifecycle-EEA Lifecycle-NIST Lifecycle-DataONE Principle-FAIR Principle-TRUST Principle-CARE
DMP-1: Metadata for Discovery Data and all associated metadata will be discoverable, through catalogues and search engines, and data access and use conditions, including licenses, will be clearly indicated. NSTC-A EEA-J NIST-B DataONE-A FAIR-F2 TRUST-U
DMP-1: Metadata for Discovery DataONE-D FAIR-F4 TRUST-Tr
DMP-1: Metadata for Discovery DataONE-F FAIR-A2
DMP-1: Metadata for Discovery FAIR-R1.1
DMP-2: Online Access Data will be accessible via online services, including, at a minimum, direct download but preferably user-customizable services for access, visualization and analysis. NSTC-B EEA-K NIST-B DataONE-A FAIR-A1 TRUST-R
DMP-2: Online Access DataONE-F FAIR-A2 TRUST-U
DMP-2: Online Access DataONE-G
DMP-3: Data Encoding Data should be structured using encodings that are widely accepted in the target user community and aligned with organizational needs and observing methods, with preference given to non-proprietary international standards. NSTC-D EEA-H NIST-A DataONE-A FAIR-I1 TRUST-R
DMP-3: Data Encoding NIST-B DataONE-E FAIR-I2 TRUST-U
DMP-3: Data Encoding NIST-D DataONE-G FAIR-I3
DMP-3: Data Encoding NIST-E DataONE-H FAIR-R1.3
DMP-4: Data Documentation Data will be comprehensively documented, including all elements necessary to access, use, understand, and process, preferably via formal structured metadata based on international or community-approved standards. To the possible extent, data will be described in peer-reviewed publications referenced in metadata records. NSTC-C EEA-J NIST-B DataONE-A FAIR-F2 TRUST-R
DMP-4: Data Documentation NIST-C DataONE-D FAIR-I1 TRUST-U
DMP-4: Data Documentation NIST-D DataONE-F FAIR-I2
DMP-4: Data Documentation NIST-E DataONE-G FAIR-I3
DMP-4: Data Documentation DataONE-H FAIR-R1
DMP-4: Data Documentation FAIR-R1.3
DMP-5: Data Traceability Data will include provenance metadata indicating the origin and processing history of raw observations and derived products, to ensure full traceability of the product chain. EEA-J NIST-D DataONE-A FAIR-F1
DMP-5: Data Traceability NIST-E DataONE-B FAIR-R1.2
DMP-5: Data Traceability DataONE-C
DMP-5: Data Traceability DataONE-D
DMP-5: Data Traceability DataONE-H
DMP-6: Data Quality-Control Data will be quality-controlled and the results of quality control shall be indicated in metadata; data made available in advance of quality control will be flagged in metadata as unchecked. EEA-I NIST-C DataONE-A TRUST-U
DMP-6: Data Quality-Control NIST-D DataONE-B
DMP-6: Data Quality-Control DataONE-C
DMP-6: Data Quality-Control DataONE-D
DMP-7: Data Preservation Data will be protected from loss and preserved for future use; preservation planning will be for the long term and include guidelines for loss prevention, retention schedules, and disposal or transfer procedures. EEA-L NIST-D DataONE-A FAIR-A2 TRUST-Tr
DMP-7: Data Preservation NIST-F DataONE-E TRUST-R
DMP-7: Data Preservation TRUST-Te
DMP-8: Data and Metadata Verification Data and associated metadata held in data management systems will be periodically verified to ensure integrity, authenticity and readability. EEA-L NIST-D DataONE-B TRUST-R
DMP-8: Data and Metadata Verification NIST-F DataONE-C
DMP-8: Data and Metadata Verification DataONE-D
DMP-8: Data and Metadata Verification DataONE-E
DMP-8: Data and Metadata Verification DataONE-F
DMP-9: Data Review and Reprocessing Data will be managed to perform corrections and updates in accordance with reviews, and to enable reprocessing as appropriate; where applicable this shall follow established and agreed procedures. EEA-L NIST-F DataONE-B TRUST-R
DMP-9: Data Review and Reprocessing DataONE-C TRUST-U
DMP-9: Data Review and Reprocessing DataONE-E
DMP-10: Persistent and Resolvable Identifiers Data will be assigned appropriate persistent, unique and resolvable identifiers to enable documents to cite the data on which they are based and to enable data providers to receive acknowledgement for use of their data. NSTC-A NIST-E DataONE-A FAIR-F1
DMP-10: Persistent and Resolvable Identifiers DataONE-B FAIR-F3
DMP-10: Persistent and Resolvable Identifiers DataONE-D FAIR-A1
DMP-10: Persistent and Resolvable Identifiers DataONE-F
DMP-10: Persistent and Resolvable Identifiers DataONE-G

GEO Data Management Principles Mapped to DataONE Lifecycle Elements

GEO Data Management Principles - DataONE

GEO Data Management Principles Mapped to NSTC Lifecycle Elements

GEO Data Management Principles - NSTC

GEO Data Management Principles Mapped to FAIR Principles Elements

GEO Data Management Principles - FAIR

GEO Data Management Principles Mapped to TRUST Principles Elements

GEO Data Management Principles - TRUST

Crosswalk of GEO, FAIR, and TRUST principles

Crosswalk of GEO, FAIR, and TRUST principles

Next Steps

While the current spreadsheet provides a useful initial platform for capturing and sharing the initial mappings, it does not provide a scalable data structure that will enable streamlined collection of data from multiple contributors, allowing for cross-validation of identified connections. Next steps for work on this project include the following:

  • Transitioning to a data model that will enable capture and management of connection information from multiple contributors - enabling cross validation of identified connections.
  • Expansion of the data model to capture information about the nature of the connections
  • Develop an online dashboard that provides current connection information based upon community contributed data
  • Publish the results of the analysis in one or more Earth Science data publication venues

Discussion Questions

  • What additional principles or lifecycle frameworks should we consider?
  • What additional applications can we support with these data?
  • Whom should be invited to contribute their assessments for linkages?